Overview

Dataset statistics

Number of variables16
Number of observations787
Missing cells1159
Missing cells (%)9.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory98.5 KiB
Average record size in memory128.2 B

Variable types

NUM12
CAT4

Reproduction

Analysis started2020-09-12 13:26:47.120394
Analysis finished2020-09-12 13:28:17.488046
Duration1 minute and 30.37 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

City has a high cardinality: 772 distinct values High cardinality
Popuation [2001] is highly correlated with Population [2011] and 1 other fieldsHigh correlation
Population [2011] is highly correlated with Popuation [2001] and 1 other fieldsHigh correlation
Female Population is highly correlated with Population [2011] and 1 other fieldsHigh correlation
Population [2011] has 48 (6.1%) missing values Missing
Popuation [2001] has 492 (62.5%) missing values Missing
Sex Ratio has 10 (1.3%) missing values Missing
Median Age has 18 (2.3%) missing values Missing
Avg Temp has 17 (2.2%) missing values Missing
Toilets Avl has 26 (3.3%) missing values Missing
Water Purity has 158 (20.1%) missing values Missing
H Index has 140 (17.8%) missing values Missing
Female Population has 141 (17.9%) missing values Missing
# of hospitals has 15 (1.9%) missing values Missing
Foreign Visitors has 90 (11.4%) missing values Missing
City is uniformly distributed Uniform

Variables

City
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count772
Unique (%)98.1%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
Aurangabad
 
3
Ramnagar
 
3
Phagwara
 
2
Kavali
 
2
Tezpur
 
2
Other values (767)
775
ValueCountFrequency (%) 
Aurangabad30.4%
 
Ramnagar30.4%
 
Phagwara20.3%
 
Kavali20.3%
 
Tezpur20.3%
 
Tinsukia20.3%
 
Jorhat20.3%
 
Tiruppur20.3%
 
Thrissur20.3%
 
Miryalaguda20.3%
 
Other values (762)76597.2%
 
2020-09-12T18:58:17.847564image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length28
Median length8
Mean length8.363405337
Min length3

State
Categorical

Distinct count33
Unique (%)4.2%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
Andhra Pradesh
 
78
Maharashtra
 
73
Uttar Pradesh
 
67
Tamil Nadu
 
63
Bihar
 
51
Other values (28)
455
ValueCountFrequency (%) 
Andhra Pradesh789.9%
 
Maharashtra739.3%
 
Uttar Pradesh678.5%
 
Tamil Nadu638.0%
 
Bihar516.5%
 
Karnataka445.6%
 
Madhya Pradesh435.5%
 
West Bengal425.3%
 
Gujarat415.2%
 
Kerala395.0%
 
Other values (23)24631.3%
 
2020-09-12T18:58:18.086420image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length27
Median length10
Mean length9.673443456
Min length3

Type
Categorical

Distinct count37
Unique (%)4.7%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
C-1T
269
M
236
M.Cl
59
MPUA
 
44
M.B
 
28
Other values (32)
151
ValueCountFrequency (%) 
C-1T26934.2%
 
M23630.0%
 
M.Cl597.5%
 
MPUA445.6%
 
M.B283.6%
 
UA283.6%
 
N.P.P131.7%
 
T.M.C131.7%
 
N.P101.3%
 
C.T91.1%
 
Other values (27)789.9%
 
2020-09-12T18:58:18.314840image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length14
Median length4
Mean length3.015247776
Min length1

Population [2011]
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count730
Unique (%)98.8%
Missing48
Missing (%)6.1%
Infinite0
Infinite (%)0.0%
Mean310283.4167794317
Minimum36776.0
Maximum12442373.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:18.782561image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum36776
5-th percentile38989.8
Q152550
median79106
Q3237476.5
95-th percentile1121730.6
Maximum12442373
Range12405597
Interquartile range (IQR)184926.5

Descriptive statistics

Standard deviation887484.8744
Coefficient of variation (CV)2.860239466
Kurtosis92.3470521
Mean310283.4168
Median Absolute Deviation (MAD)36108
Skewness8.579129273
Sum229299445
Variance7.876294023e+11
2020-09-12T18:58:19.097863image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7077720.3%
 
4998520.3%
 
3855420.3%
 
6163220.3%
 
20616720.3%
 
4246120.3%
 
4585820.3%
 
6523220.3%
 
3780220.3%
 
4431410.1%
 
Other values (720)72091.5%
 
(Missing)486.1%
 
ValueCountFrequency (%) 
3677610.1%
 
3680510.1%
 
3682810.1%
 
3694710.1%
 
3695410.1%
 
ValueCountFrequency (%) 
1244237310.1%
 
1100783510.1%
 
843667510.1%
 
680997010.1%
 
557058510.1%
 

Popuation [2001]
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count292
Unique (%)99.0%
Missing492
Missing (%)62.5%
Infinite0
Infinite (%)0.0%
Mean532045.1322033898
Minimum29354.0
Maximum11978450.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:19.300941image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum29354
5-th percentile99573
Q1169432
median236600
Q3474585
95-th percentile1410133.1
Maximum11978450
Range11949096
Interquartile range (IQR)305153

Descriptive statistics

Standard deviation1067831.381
Coefficient of variation (CV)2.007031577
Kurtosis65.66343066
Mean532045.1322
Median Absolute Deviation (MAD)97282
Skewness7.26489863
Sum156953314
Variance1.140263858e+12
2020-09-12T18:58:19.472812image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
27181120.3%
 
29666220.3%
 
22817520.3%
 
26090610.1%
 
26912210.1%
 
16521210.1%
 
31096710.1%
 
23151510.1%
 
16612510.1%
 
42667410.1%
 
Other values (282)28235.8%
 
(Missing)49262.5%
 
ValueCountFrequency (%) 
2935410.1%
 
7345510.1%
 
7919010.1%
 
7939310.1%
 
8150310.1%
 
ValueCountFrequency (%) 
1197845010.1%
 
987917210.1%
 
457287610.1%
 
434364510.1%
 
430132610.1%
 

Sex Ratio
Real number (ℝ≥0)

MISSING

Distinct count160
Unique (%)20.6%
Missing10
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean905.7129987129987
Minimum818.0
Maximum1042.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:19.660230image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum818
5-th percentile846
Q1877
median906
Q3928
95-th percentile968
Maximum1042
Range224
Interquartile range (IQR)51

Descriptive statistics

Standard deviation37.01854158
Coefficient of variation (CV)0.04087226488
Kurtosis-0.08383561613
Mean905.7129987
Median Absolute Deviation (MAD)25
Skewness0.2218225024
Sum703739
Variance1370.372421
2020-09-12T18:58:19.815751image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
923192.4%
 
871151.9%
 
922141.8%
 
872141.8%
 
929111.4%
 
916111.4%
 
882111.4%
 
890111.4%
 
917101.3%
 
869101.3%
 
Other values (150)65182.7%
 
ValueCountFrequency (%) 
81810.1%
 
82010.1%
 
82110.1%
 
82210.1%
 
82310.1%
 
ValueCountFrequency (%) 
104210.1%
 
103610.1%
 
103110.1%
 
102310.1%
 
101910.1%
 

Median Age
Real number (ℝ≥0)

MISSING

Distinct count10
Unique (%)1.3%
Missing18
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean26.18335500650195
Minimum23.0
Maximum32.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:19.985268image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile23
Q124
median26
Q328
95-th percentile29
Maximum32
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.113062962
Coefficient of variation (CV)0.08070252884
Kurtosis-1.015686277
Mean26.18335501
Median Absolute Deviation (MAD)2
Skewness0.1090339961
Sum20135
Variance4.465035083
2020-09-12T18:58:20.152823image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2811714.9%
 
2511614.7%
 
2410813.7%
 
2910813.7%
 
2610112.8%
 
279812.5%
 
239712.3%
 
30141.8%
 
3181.0%
 
3220.3%
 
(Missing)182.3%
 
ValueCountFrequency (%) 
239712.3%
 
2410813.7%
 
2511614.7%
 
2610112.8%
 
279812.5%
 
ValueCountFrequency (%) 
3220.3%
 
3181.0%
 
30141.8%
 
2910813.7%
 
2811714.9%
 

Avg Temp
Real number (ℝ≥0)

MISSING

Distinct count26
Unique (%)3.4%
Missing17
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean30.941558441558442
Minimum5.0
Maximum40.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:20.359271image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile13.9
Q128
median31
Q336
95-th percentile40
Maximum40
Range35
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.968288763
Coefficient of variation (CV)0.2252080733
Kurtosis3.116849321
Mean30.94155844
Median Absolute Deviation (MAD)4
Skewness-1.489797407
Sum23825
Variance48.55704828
2020-09-12T18:58:20.854492image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
30577.2%
 
37536.7%
 
26536.7%
 
29506.4%
 
33496.2%
 
40476.0%
 
28476.0%
 
25476.0%
 
38455.7%
 
35455.7%
 
Other values (16)27735.2%
 
ValueCountFrequency (%) 
550.6%
 
640.5%
 
720.3%
 
870.9%
 
940.5%
 
ValueCountFrequency (%) 
40476.0%
 
39354.4%
 
38455.7%
 
37536.7%
 
36303.8%
 

SWM
Categorical

Distinct count3
Unique (%)0.4%
Missing4
Missing (%)0.5%
Memory size6.1 KiB
HIGH
272
LOW
260
MEDIUM
251
ValueCountFrequency (%) 
HIGH27234.6%
 
LOW26033.0%
 
MEDIUM25131.9%
 
(Missing)40.5%
 
2020-09-12T18:58:21.401212image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.302414231
Min length3

Toilets Avl
Real number (ℝ≥0)

MISSING

Distinct count107
Unique (%)14.1%
Missing26
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean96.08672798948751
Minimum50.0
Maximum227.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:21.843212image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile54
Q170
median92
Q3119
95-th percentile146
Maximum227
Range177
Interquartile range (IQR)49

Descriptive statistics

Standard deviation30.5329907
Coefficient of variation (CV)0.3177649124
Kurtosis0.4551361719
Mean96.08672799
Median Absolute Deviation (MAD)24
Skewness0.6519525763
Sum73122
Variance932.263521
2020-09-12T18:58:22.138423image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
65151.9%
 
90151.9%
 
66151.9%
 
100141.8%
 
91141.8%
 
61141.8%
 
70131.7%
 
99131.7%
 
92121.5%
 
57121.5%
 
Other values (97)62479.3%
 
(Missing)263.3%
 
ValueCountFrequency (%) 
5060.8%
 
5160.8%
 
5291.1%
 
53101.3%
 
54101.3%
 
ValueCountFrequency (%) 
22710.1%
 
21910.1%
 
21710.1%
 
21510.1%
 
21210.1%
 

Water Purity
Real number (ℝ≥0)

MISSING

Distinct count101
Unique (%)16.1%
Missing158
Missing (%)20.1%
Infinite0
Infinite (%)0.0%
Mean150.37360890302068
Minimum100.0
Maximum200.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:22.324925image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile105
Q1125
median150
Q3176
95-th percentile195
Maximum200
Range100
Interquartile range (IQR)51

Descriptive statistics

Standard deviation29.06376698
Coefficient of variation (CV)0.1932770463
Kurtosis-1.244995104
Mean150.3736089
Median Absolute Deviation (MAD)25
Skewness-0.01605517129
Sum94585
Variance844.7025508
2020-09-12T18:58:22.483189image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
134121.5%
 
129121.5%
 
172111.4%
 
116111.4%
 
105111.4%
 
115111.4%
 
125101.3%
 
173101.3%
 
17691.1%
 
18691.1%
 
Other values (91)52366.5%
 
(Missing)15820.1%
 
ValueCountFrequency (%) 
10030.4%
 
10150.6%
 
10281.0%
 
10330.4%
 
10430.4%
 
ValueCountFrequency (%) 
20050.6%
 
19981.0%
 
19860.8%
 
19791.1%
 
19610.1%
 

H Index
Real number (ℝ≥0)

MISSING

Distinct count647
Unique (%)100.0%
Missing140
Missing (%)17.8%
Infinite0
Infinite (%)0.0%
Mean0.4970691911347755
Minimum0.0030743436420811454
Maximum0.9997737154818801
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:22.674351image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0.003074343642
5-th percentile0.04402641715
Q10.2385864179
median0.5070035548
Q30.7525169391
95-th percentile0.9474487944
Maximum0.9997737155
Range0.9966993718
Interquartile range (IQR)0.5139305211

Descriptive statistics

Standard deviation0.2934213852
Coefficient of variation (CV)0.5903029003
Kurtosis-1.262279578
Mean0.4970691911
Median Absolute Deviation (MAD)0.2518362572
Skewness-0.001863978178
Sum321.6037667
Variance0.08609610929
2020-09-12T18:58:22.825139image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.00875566498610.1%
 
0.0404473483310.1%
 
0.4389866510.1%
 
0.0945794144810.1%
 
0.262276927710.1%
 
0.762112330310.1%
 
0.175012139410.1%
 
0.65081814110.1%
 
0.679264464710.1%
 
0.907662014710.1%
 
Other values (637)63780.9%
 
(Missing)14017.8%
 
ValueCountFrequency (%) 
0.00307434364210.1%
 
0.00492150215310.1%
 
0.00516842371510.1%
 
0.00641219540710.1%
 
0.00875566498610.1%
 
ValueCountFrequency (%) 
0.999773715510.1%
 
0.99918552810.1%
 
0.999139200410.1%
 
0.999111407310.1%
 
0.998756995210.1%
 

Female Population
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count645
Unique (%)99.8%
Missing141
Missing (%)17.9%
Infinite0
Infinite (%)0.0%
Mean291001.13931888546
Minimum30913.0
Maximum10924403.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:23.012583image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum30913
5-th percentile35146.75
Q145144.5
median83067.5
Q3220677.25
95-th percentile978485.25
Maximum10924403
Range10893490
Interquartile range (IQR)175532.75

Descriptive statistics

Standard deviation835434.7538
Coefficient of variation (CV)2.870898567
Kurtosis80.30715994
Mean291001.1393
Median Absolute Deviation (MAD)45648
Skewness8.09154773
Sum187986736
Variance6.979512278e+11
2020-09-12T18:58:23.184421image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3513920.3%
 
5506110.1%
 
39287110.1%
 
21870710.1%
 
97108310.1%
 
48298110.1%
 
3802510.1%
 
4605810.1%
 
20483710.1%
 
10834610.1%
 
Other values (635)63580.7%
 
(Missing)14117.9%
 
ValueCountFrequency (%) 
3091310.1%
 
3126310.1%
 
3227710.1%
 
3269410.1%
 
3297310.1%
 
ValueCountFrequency (%) 
1092440310.1%
 
944472210.1%
 
789672810.1%
 
633327210.1%
 
474613810.1%
 

# of hospitals
Real number (ℝ≥0)

MISSING

Distinct count76
Unique (%)9.8%
Missing15
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean41.84974093264249
Minimum10.0
Maximum159.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:23.387466image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile11
Q118
median28
Q367
95-th percentile94
Maximum159
Range149
Interquartile range (IQR)49

Descriptive statistics

Standard deviation29.08693909
Coefficient of variation (CV)0.6950327158
Kurtosis-0.6315321958
Mean41.84974093
Median Absolute Deviation (MAD)15.5
Skewness0.7387807669
Sum32308
Variance846.0500259
2020-09-12T18:58:23.543680image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
26344.3%
 
10313.9%
 
13283.6%
 
29273.4%
 
30273.4%
 
11243.0%
 
20232.9%
 
28232.9%
 
22222.8%
 
15222.8%
 
Other values (66)51164.9%
 
ValueCountFrequency (%) 
10313.9%
 
11243.0%
 
12212.7%
 
13283.6%
 
14192.4%
 
ValueCountFrequency (%) 
15910.1%
 
14810.1%
 
12310.1%
 
11010.1%
 
10030.4%
 

Foreign Visitors
Real number (ℝ≥0)

MISSING

Distinct count32
Unique (%)4.6%
Missing90
Missing (%)11.4%
Infinite0
Infinite (%)0.0%
Mean1457944.992826399
Minimum798.0
Maximum4684707.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:23.744170image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum798
5-th percentile24720
Q1237854
median636502
Q33104060
95-th percentile4684707
Maximum4684707
Range4683909
Interquartile range (IQR)2866206

Descriptive statistics

Standard deviation1664151.074
Coefficient of variation (CV)1.141436119
Kurtosis-0.5868366511
Mean1457944.993
Median Absolute Deviation (MAD)510424
Skewness1.027530692
Sum1016187660
Variance2.769398798e+12
2020-09-12T18:58:23.900004image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
237854729.1%
 
4408916688.6%
 
4684707597.5%
 
3104060567.1%
 
923737465.8%
 
636502415.2%
 
977479395.0%
 
284973364.6%
 
421365344.3%
 
126078324.1%
 
Other values (22)21427.2%
 
(Missing)9011.4%
 
ValueCountFrequency (%) 
79820.3%
 
276920.3%
 
326010.1%
 
6394101.3%
 
802710.1%
 
ValueCountFrequency (%) 
4684707597.5%
 
4408916688.6%
 
3104060567.1%
 
237916920.3%
 
1489500313.9%
 

Covid Cases
Real number (ℝ≥0)

Distinct count642
Unique (%)81.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6615.646759847522
Minimum334
Maximum218502
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-09-12T18:58:24.165563image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile1939.1
Q12270
median2582
Q38761
95-th percentile13926
Maximum218502
Range218168
Interquartile range (IQR)6491

Descriptive statistics

Standard deviation15108.10276
Coefficient of variation (CV)2.283692481
Kurtosis99.52218115
Mean6615.64676
Median Absolute Deviation (MAD)486
Skewness9.241026721
Sum5206514
Variance228254769
2020-09-12T18:58:24.587341image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
252360.8%
 
220850.6%
 
249050.6%
 
222040.5%
 
221340.5%
 
257640.5%
 
226530.4%
 
243030.4%
 
208130.4%
 
232330.4%
 
Other values (632)74794.9%
 
ValueCountFrequency (%) 
33410.1%
 
35810.1%
 
42810.1%
 
43810.1%
 
44910.1%
 
ValueCountFrequency (%) 
21850210.1%
 
16311510.1%
 
15079310.1%
 
14560620.3%
 
14100010.1%
 

Interactions

2020-09-12T18:57:33.307985image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:33.987439image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:34.299865image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:34.534191image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:34.760667image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:34.994044image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:35.231412image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:35.487753image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:35.704178image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:35.903291image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:36.137579image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:37.263850image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:37.480272image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:37.681733image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:37.872519image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:38.059941image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:38.341124image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:38.813149image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.021590image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.215102image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.405601image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.601043image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.795000image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:39.966836image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:40.169909image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:40.372985image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:40.560411image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:40.864043image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:41.115368image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:41.395622image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:41.662906image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:41.856036image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:42.059145image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:42.262196image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:42.465269image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:42.652725image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:42.886125image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:43.093544image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:43.278051image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:43.570299image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:43.762448image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:44.231087image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:44.527924image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:44.715381image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:44.928566image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:45.118478image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:45.370459image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:45.579299image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:45.797971image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:46.063561image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:46.297855image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:46.516551image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:46.775582image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:47.052839image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:47.302276image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:47.565977image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:47.925268image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:48.159618image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:48.409531image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:48.659472image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:48.898445image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:49.195216image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:49.523261image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:50.148115image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:50.366843image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:50.616785image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:50.874232image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:51.086220image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:51.320572image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:51.586103image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:51.836044image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:52.070364image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:52.320341image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:52.695220image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:52.899557image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:53.133923image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:53.347358image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:53.582698image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:53.811078image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:54.105001image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:54.331156image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:54.565476image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:54.794005image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:55.001445image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:55.565909image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:55.874083image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:56.077161image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:56.280237image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:56.483282image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:56.686389image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:57.036775image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:57.371850image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:57.653102image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:57.920453image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:58.154772image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:58.467198image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:58.905786image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:59.258842image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:59.579983image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:57:59.840981image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:00.106512image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:00.372077image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:00.607410image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:01.097413image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:01.700803image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:01.968350image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:02.233913image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:02.468235image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:02.686931image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:02.946048image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:03.162469image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:03.429786image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:03.730526image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:03.996102image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:04.277303image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:04.767941image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:05.002317image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:05.224722image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:05.475053image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:05.704440image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:05.949729image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:06.262155image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:06.777658image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:07.480618image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:08.022246image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:08.548839image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:08.871657image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:09.262193image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:09.652725image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:09.913872image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:10.144285image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:10.393593image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:10.633946image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:10.889079image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:11.092158image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:11.342102image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:11.545180image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:11.785751image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:12.035081image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:12.275437image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:12.504795image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:12.997476image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:13.563021image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:13.864680image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-09-12T18:58:24.868527image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-12T18:58:25.266378image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-12T18:58:25.610045image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-12T18:58:25.981658image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-12T18:58:26.340946image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-12T18:58:15.205352image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:16.001565image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:16.746830image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-12T18:58:17.215471image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

CityStateTypePopulation [2011]Popuation [2001]Sex RatioMedian AgeAvg TempSWMToilets AvlWater PurityH IndexFemale Population# of hospitalsForeign VisitorsCovid Cases
0MumbaiMaharashtraM.C12442373.011978450.0878.023.032.0MEDIUM219.0150.00.70044010924403.0159.04408916.0163115
1DelhiDelhiM.C11007835.09879172.0858.027.030.0MEDIUM215.0196.00.9200189444722.0148.02379169.080188
2BangaloreKarnatakaMPUA8436675.04301326.0936.028.037.0HIGH212.0102.00.0970857896728.0123.0636502.0141000
3HyderabadTelanganaMPUA6809970.03637483.0930.023.031.0MEDIUM217.0118.00.8277446333272.0110.0126078.055123
4AhmedabadGujaratMPUA5570585.03520085.0852.029.025.0LOW227.0109.00.8479414746138.073.0284973.033204
5ChennaiTamil NaduMPUA4681087.04343645.0904.026.031.0HIGH210.0179.00.5369954231703.067.04684707.0145606
6ChennaiTamil naduT4646732.0NaN912.026.030.0MEDIUM145.0177.00.0934514237820.055.04684707.0145606
7KolkataWest BengalMPUA4486679.04572876.0945.026.037.0NaNNaNNaN0.4735854239912.082.01489500.044957
8SuratGujaratMPUA4467797.02433835.0NaN27.026.0NaNNaNNaN0.8093343797627.098.0284973.023432
9PuneMaharashtraMPUA3124458.02538473.0NaN29.029.0NaNNaNNaN0.4459022743274.050.04408916.0218502

Last rows

CityStateTypePopulation [2011]Popuation [2001]Sex RatioMedian AgeAvg TempSWMToilets AvlWater PurityH IndexFemale Population# of hospitalsForeign VisitorsCovid Cases
777ShahbadHaryanaM.C37289.0NaN829.024.033.0MEDIUM77.0171.00.78998430913.028.0303118.01988
778PuranpurUttar PradeshM.B37233.0NaN886.028.035.0MEDIUM66.0195.00.37881232988.012.03104060.02478
779NelamangalaKarnatakaT.M.C37232.0NaN931.024.034.0MEDIUM78.0134.00.38226534663.019.0636502.02232
780LalganjBiharN.A.C37000.0NaN919.029.036.0LOW54.0168.00.28970934003.019.0923737.02663
781NakodarPunjabM.Cl36973.0NaN873.026.031.0LOW61.0171.00.26589032277.014.0242367.02268
782LunawadaGujaratM36954.0NaN846.023.028.0MEDIUM68.0103.00.03528031263.019.0284973.01944
783MurshidabadWest BengalM36947.0NaN945.023.036.0MEDIUM62.0136.00.05639434915.022.01489500.02172
784MahePuducherryM36828.0NaN1019.028.028.0HIGH98.0138.00.06675237528.027.0106153.02851
785LankaAssamM.B36805.0NaN900.024.06.0MEDIUM63.0145.00.62755633125.015.024720.02158
786RudauliUttar PradeshM.B36776.0NaN889.025.037.0HIGH51.0181.00.31338332694.030.03104060.02220